Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria

نویسنده

  • Dekai Wu
چکیده

We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) length-based statistical method to the task of alignment involving a non-Indo-European language; and (3) an improved statistical method that also incorporates domain-speciic lexical cues.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Aligning Parallel English-chinese Texts Statistically with Lexical Criteria

We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) length-based statistical method to the task of alignment involving a non-Indo-European language; and (3) an improve...

متن کامل

Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria

We present a new approach to aligning sentences in bilingual parallel corpora based on punctuation, especially for English and Chinese. Although the length-based approach produces high accuracy rates of sentence alignment for clean parallel corpora written in two Western languages, such as French-English or German-English, it does not work as well for parallel corpora that are noisy or written ...

متن کامل

SemEval-2007 Task 11: English Lexical Sample Task via English-Chinese Parallel Text

We made use of parallel texts to gather training and test examples for the English lexical sample task. Two tracks were organized for our task. The first track used examples gathered from an LDC corpus, while the second track used examples gathered from a Web corpus. In this paper, we describe the process of gathering examples from the parallel corpora, the differences with similar tasks in pre...

متن کامل

Cross Sentence Alignment for Structurally Dissimilar Corpus Based on Singular Value Decomposition

Extracting the alignment pairs is a critical step for constructing bilingual corpus knowledge base for Example Based Machine Translation Systems. Different methods have been proposed in aligning parallel corpus between two different languages. However, most of them focus on structurally similar languages like English-French. This paper presents a method of cross aligning Portuguese-Chinese bili...

متن کامل

Bilingual Sentence Alignment Based on Punctuation Marks

We present a new approach to aligning English and Chinese sentences in parallel corpora based solely on punctuations. Although the length based approach produces high accuracy rates of sentence alignment for clean parallel corpora written in two Western languages such as French-English and German-English, it does not fair as well for parallel corpora that are noisy or written in two distant lan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994